BMLR Chapter 2
Cornell College
STA 363 Fall 2024 Block 1
Describe the concept of a likelihood
Construct the likelihood for a simple model
Define the Maximum Likelihood Estimate (MLE) and use it to answer an analysis question
Identify three ways to calculate or approximate the MLE and apply these methods to find the MLE for a simple model
Use likelihoods to compare models (next week)
A likelihood is a function that tells us how likely we are to observe our data for a given parameter value (or values).
Unlike Ordinary Least Squares (OLS), they do not require the responses be independent, identically distributed, and normal (iidN)
They are not the same as probability functions
Probability function: Fixed parameter value(s) + input possible outcomes \(\Rightarrow\) probability of seeing the different outcomes given the parameter value(s)
Likelihood: Fixed data + input possible parameter values \(\Rightarrow\) probability of seeing the fixed data for each parameter value
The data set 04-refs.csv includes 30 randomly selected NCAA men’s basketball games played in the 2009 - 2010 season.
We will focus on the variables foul1, foul2, and foul3, which indicate which team had a foul called them for the 1st, 2nd, and 3rd fouls, respectively. - H: Foul was called on the home team - V: Foul was called on the visiting team
We are focusing on the first three fouls for this analysis, but this could easily be extended to include all fouls in a game.
[The dataset was derived from basektball0910.csv used in BMLR Section 11.2
| game | date | visitor | hometeam | foul1 | foul2 | foul3 |
|---|---|---|---|---|---|---|
| 166 | 20100126 | CLEM | BC | V | V | V |
| 224 | 20100224 | DEPAUL | CIN | H | H | V |
| 317 | 20100109 | MARQET | NOVA | H | H | H |
| 214 | 20100228 | MARQET | SETON | V | V | H |
| 278 | 20100128 | SETON | SFL | H | V | V |
We will treat the games as independent in this analysis.
Model 1 (Unconditional Model): What is the probability the referees call a foul on the home team, assuming foul calls within a game are independent?
Model 2 (Conditional Model): - Is there a tendency for the referees to call more fouls on the visiting team or home team? - Is there a tendency for referees to call a foul on the team that already has more fouls?
Ultimately we want to decide which model is better.
Let \(p_H\) be the probability the referees call a foul on the home team.
The likelihood for a single observation
\[Lik(p_H) = p_H^{y_i}(1 - p_H)^{n_i - y_i}\]
Where \(y_i\) is the number of fouls called on the home team.
(In this example, we know \(n_i = 3\) for all observations.)
Example
For a single game where the first three fouls are \(H, H, V\), then
\[Lik(p_H) = p_H^{2}(1 - p_H)^{3 - 2} = p_H^{2}(1 - p_H)\]
| Foul1 | Foul2 | Foul3 | n | Likelihood Contribution |
|---|---|---|---|---|
| H | H | H | 3 | \(p_H^3\) |
| H | H | V | 2 | \(p_H^2(1 - p_H)\) |
| H | V | H | 3 | \(p_H^2(1 - p_H)\) |
| H | V | V | 7 | A |
| V | H | H | 7 | B |
| V | H | V | 1 | \(p_H(1 - p_H)^2\) |
| V | V | H | 5 | \(p_H(1 - p_H)^2\) |
| V | V | V | 2 | \((1 - p_H)^3\) |
Fill in A and B.
Because the observations (the games) are independent, the likelihood is
\[Lik(p_H) = \prod_{i=1}^{n}p_H^{y_i}(1 - p_H)^{3 - y_i}\]
We will use this function to find the maximum likelihood estimate (MLE). The MLE is the value between 0 and 1 where we are most likely to see the observed data.
A. 0.489
B. 0.500
C. 0.511
D. 0.556
There are three primary ways to find the MLE
✅ Approximate using a graph
✅ Numerical approximation
✅ Using calculus
Specify a finite set of possible values the for \(p_H\) and calculate the likelihood for each value
Find the MLE by taking the first derivative of the likelihood function.
This can be tricky because of the Product Rule, so we can maximize the log(Likelihood) instead. The same value maximizes the likelihood and log(Likelihood)

Since calculus is not a pre-req, we will forgo this quest.
Is there a tendency for the referees to call more fouls on the visiting team or home team?
Is there a tendency for referees to call a foul on the team that already has more fouls?
Define new parameters:
\(p_{H|N}\): Probability referees call foul on home team given there are equal numbers of fouls on the home and visiting teams
\(p_{H|H Bias}\): Probability referees call foul on home team given there are more prior fouls on the home team
\(p_{H|V Bias}\): Probability referees call foul on home team given there are more prior fouls on the visiting team
| Foul1 | Foul2 | Foul3 | n | Likelihood Contribution |
|---|---|---|---|---|
| H | H | H | 3 | \(p_H^3\) |
| H | H | V | 2 | \(p_H^2(1 - p_H)\) |
| H | V | H | 3 | \(p_H^2(1 - p_H)\) |
| H | V | V | 7 | A |
| V | H | H | 7 | B |
| V | H | V | 1 | \(p_H(1 - p_H)^2\) |
| V | V | H | 5 | \(p_H(1 - p_H)^2\) |
| V | V | V | 2 | \((1 - p_H)^3\) |
Fill in A and B
\[\begin{aligned}Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias}) &= [(p_{H| N})^{25}(1 - p_{H|N})^{23}(p_{H| H Bias})^8 \\ &(1 - p_{H| H Bias})^{12}(p_{H| V Bias})^{13}(1-p_{H|V Bias})^9]\end{aligned}\]
(Note: The exponents sum to 90, the total number of fouls in the data)
\[\begin{aligned}\log (Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias})) &= 25 \log(p_{H| N}) + 23 \log(1 - p_{H|N}) \\ & + 8 \log(p_{H| H Bias}) + 12 \log(1 - p_{H| H Bias})\\ &+ 13 \log(p_{H| V Bias}) + 9 \log(1-p_{H|V Bias})\end{aligned}\]
\(\hat{p}_H\) is greater than \(\hat{p}_{H\vert H Bias}\) and \(\hat{p}_{H \vert V Bias}\)
\(\hat{p}_{H\vert H Bias}\) is greater than \(\hat{p}_H\) and \(\hat{p}_{H \vert V Bias}\)
\(\hat{p}_{H\vert V Bias}\) is greater than \(\hat{p}_H\) and \(\hat{p}_{H \vert V Bias}\)
They are all approximately equal.
\(\hat{p}_H\) is greater than \(\hat{p}_{H\vert H Bias}\)
\(\hat{p}_{H\vert H Bias}\) is greater than \(\hat{p}_H\)
They are approximately equal.
These slides are based on content in BMLR: Chapter 1 - Review of Multiple Linear Regression
Initial versions of the slides are by Dr. Maria Tackett, Duke University